-
Notifications
You must be signed in to change notification settings - Fork 53
Add option for a custom alignment file in caprieval #1418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| if align_fname: | ||
| if not Path(align_fname).exists(): | ||
| raise FileNotFoundError(f"Custom alignment file {align_fname} not found") | ||
| align_func = partial(align_custom, custom_alig_file = Path(align_fname)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This here is a bit odd, the function is deciding between structural/sequence alignment - what is the type of this new alignment you are adding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align_fname is a path to a custom alignment file - for those extremely rare cases when haddock cannot handle alignment, e.g. short DNA and extended DNA.
Here I am avoiding adding new allowed value in alignment_method parameter (would be smth like alignment_method = custom). Instead, I am adding a new parameter align_fname. And so the idea is to chek if user provided a custom alignment file - and also that this file exists - before checkig for the value in alignment_method.
Does it make sence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not that I am checking - no need for the FileNotFoundError here, libutil will take care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this is actually a good case for kwargs;
def get_align(method: str, **kwargs) -> partial[dict[str, dict[int, int]]]:
if method == "structure":
lovoalign_exec = kwargs.get("lovoalign_exec")
if lovoalign_exec is None:
raise ValueError("lovoalign_exec is required for structural alginment")
align_func = partial(align_strct, lovoalign_exec=lovoalign_exec)
elif method == "sequence":
align_func = partial(align_seq)
elif method == "custom":
align_fname = kwargs.get("align_fname")
if align_fname is None:
raise ValueError("align_fname is required for structure alignment")
align_func = partial(align_custom, custom_alig_file=Path(align_fname))
else:
available_alns = ("sequence", "structure", "custom")
raise ValueError(
f"Alignment method {method!r} not recognized. "
f"Available options are {', '.join(available_alns)}"
)
return align_funcThen this get_align can be called as:
foo = get_align("structure", lovoalign_exec=exec)
# OR
foo = get_align("custom", align_fname=fname)(I typed the code out of my head maybe the syntax is wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation in my point of view is cleaner and simpler, *in your implementation you are effectively triggering a new type of method by using a parameter which is not "method", here you are using align_fname instead - you see the complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. I do see how the code is cleaner, but I also see a new dependency between method == “custom” and align_fname=fname.
In practice, most if not all users will specify align_fname, while method will remain at its default, so “sequence” - this is smth we’ve seen resently with the per-interface-scoring PR. Sure, it’s possible to handle this mismatch, print error message etc., but wouldn’t it be more straightforward to ignore method var whenever align_fname is defined? Given I add comment in the code to explain this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frequency (or the practice) of a code pathway is not really relevant as the code exists regardless of how many times it is used.
This get_align is a factory function, and the proposed implementation will break this design. So it's not a matter of whether this is documented or not, it's more about design patterns. My suggestion is to stick to the patterns, as they exist for a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how do you interupt the run if align_fname=fname and method == “ sequence”?
Raising error in libparallel.py just logs warnings, but the run keeps going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is also by design, parameter validation is done upstream. For this specific case this can be a method in the CAPRI class:
class CAPRI:
def __init__(
# ...
):
# ...
with log_error_and_exit():
self.validate_alignment_parameters()
def validate_alignment_parameters(self):
# handle the validation logic here
if self.params["align_fname"] is not None and method == "sequence":
raise ValueError("if `align_fname` is defined method must be `custom`")
# other checks etc
Checklist
CHANGELOG.mdis updated to incorporate new changesSummary of the Pull Request
Related Issue
#1410
Additional Info
This does work on DNA-ligan example under the condition that full alignment is provided for both receptor and ligand chains. If any given chain is missing from the alignement - this chain is ignored by caprieval completely.